Binary Neural Networks Algorithms, Architectures, and Applications (Baochang Zhang, Sheng Xu, Mingbao Lin etc.)

106

Binary Neural Architecture Search

Discrepant

Child search

Tangent propagation

based on Parent

Child

Decoupled

optimization

Parent

FIGURE 4.11

The main framework of the proposed DCP-NAS, where α and ˆα denote real-valued and

binary architecture, respectively. We ﬁrst conduct the real-valued NAS in a single round

and generate the corresponding tangent direction. Then we learn a discrepant binary ar-

chitecture via tangent propagation. In this process, real-valued and binary networks inherit

architectures from their counterparts, in turn.

where ⊗is the convolution operation. We omit the batch normalization (BN) and activation

layers for simplicity. Based on this, a normal NAS problem is given as

max

w∈W,α∈A ^f⁽^w^{, α}⁾^,

(4.20)

where f : W ×A →R is a diﬀerentiable objective function w.r.t. the network weight w ∈W

and the architecture space A ∈RM×E, where E and M denote the number of edges and

operators, respectively. Considering that minimizing f(w, α) is a black-box optimization,

we relax the objective function to ^˜f(w, α) as the objective of NAS

min

w∈W,α∈A ^L^NAS⁼⁻^˜^f⁽^w^{, α}⁾

= −

n=1

pn(X) log(pn(w, α)),

(4.21)

where N denotes the number of classes and X is the input data. ^˜f(w, α) represents the

performance of a speciﬁc architecture with real value weights, where pn(X) and pn(w, α)

denote the true distribution and the distribution of network prediction, respectively.

Binary neural architecture search The 1-bit model aims to quantize ˆw and ˆain into

b ^ˆ^w∈{−1, +1}Cout×Cin×K×K and b^ˆ^aⁱⁿ∈{−1, +1}Cin×H×W using the eﬃcient XNOR and

Bit-count operations to replace full precision operations. Following [48], the forward process

of the 1-bit CNN is

ˆaout = β ◦b^ˆ^aⁱⁿ⊙b ^ˆ^w,

(4.22)

where ⊙is the XNOR, and bit count operations and ◦denotes channelwise multiplication.

β = [β1, · · · , βCout] ∈R⁺

Cout ^{is the vector consisting of channel-wise scale factors.}^b⁼

sign(·) denotes the binarized variable using the sign function, which returns one if the

input is greater than zero and −1 otherwise. It then enters several non-linear layers, e.g.,